Regret Minimization in MDPs with Options without Prior Knowledge
نویسندگان
چکیده
Motivations I “Flat” RL : difficult to learn complex behaviours (eg, sequence of subgoals) ⇒ Humans abstract from low-level actions I Hierarchical RL : decompose large problems into smaller ones by imposing constraints on value function or policy I Possible implementation: options [Sutton et al., 1999] I Empirical observations: introducing options in an MDP can speed up learning but can also be harmful [Jong et al., 2008]. ⇒ Lack of theoretical motivation and understanding of options Theoretical analysis of learning with options I Adding options does not just reduce the space of stationary policies, the exploration is also greatly affected
منابع مشابه
Exploration-Exploitation in MDPs with Options
While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first a...
متن کاملA Regret Minimization Approach in Product Portfolio Management with respect to Customers’ Price-sensitivity
In an uncertain and competitive environment, product portfolio management (PPM) becomes more challenging for manufacturers to decide what to make and establish the most beneficial product portfolio. In this paper, a novel approach in PPM is proposed in which the environment uncertainty, competitors’ behavior and customer’s satisfaction are simultaneously considered as the most important criteri...
متن کاملGeneralised Entropy MDPs and Minimax Regret
Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.
متن کاملPricing Exotic Derivatives Using Regret Minimization
We price various financial instruments, which are classified as exotic options, using the regret bounds of an online algorithm. In addition, we derive a general result, which upper bounds the price of any derivative whose payoff is a convex function of the final asset price. The market model used is adversarial, making our price bounds robust. Our results extend the work of [9], which used regr...
متن کاملMinimizing Regret in Dynamic Decision Problems
The menu-dependent nature of regret-minimization creates subtleties in applying regret-minimization to dynamic decision problems. Firstly, it is not clear whether forgone opportunities should be included in the menu. We explain commonly observed behavioral patterns as minimizing regret when forgone opportunities are present, and also show how the treatment of forgone opportunities affects behav...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017